HealthData@EU Pilot - Sciensano Use Case : Population uptake metrics: COVID-19 test positivity, vaccination and hospitalization

Quality Analysis

Overview

This section provides an overview of the imported dataset. Dataset statistics, variable types, a missing data profile and potential alerts are shown below.

Discrete variable 23
Continuous variable 4
All missing variable 0


exitus_dt has 181090 (90.5%) missing values Missing
dose_3_brand_cd has 181236 (90.6%) missing values Missing
dose_3_dt has 181164 (90.6%) missing values Missing
fully_vaccinated_dt has 183793 (91.9%) missing values Missing
The variable ‘person_id’ does not have all unique values Number of duplicate values: 9999 Not unique

Variables

This section provides more detailed information per variable in the imported dataset.

Class of the variable: character

More than 100 distinct values

More than 100 distinct values

Class of the variable: character
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
Class of the variable: integer

More than 100 distinct values

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 10000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: character
Class of the variable: Date

More than 100 distinct values

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 181090 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: logical
Class of the variable: character
Class of the variable: character
Class of the variable: character
Class of the variable: character
Class of the variable: character

More than 100 distinct values

More than 100 distinct values

Class of the variable: character
Class of the variable: character
Class of the variable: character
Class of the variable: character
Class of the variable: logical
Class of the variable: integer
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 10000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: integer
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 10000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: character
Class of the variable: Date

More than 100 distinct values

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 10000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: character
Class of the variable: Date

More than 100 distinct values

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 28937 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: character
Class of the variable: Date

More than 100 distinct values

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 181164 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: integer
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 10000 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: Date

More than 100 distinct values

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 183793 rows containing non-finite outside the scale range
(`stat_bin()`).
Class of the variable: logical

Compliance with the Common Data Model specification

We check whether the imported dataset complies with the data model specification (https://docs.google.com/spreadsheets/d/1Eva2ucg_M0WaDkCaF7qfBxk2DwTlUac9gKuP3xck4rw/edit#gid=0).

To comply with the data model, the dataset must pass a number of validation rules. The data are tested against this set of validation rules and results from this validation process are summarized.

Validation rule Name rule Items Passes Fails Percentage of fails Number of NAs Percentage of NAs Error Warning
is.na(sex_cd) | sex_cd %vin% c(“0”, “1”, “2”, “9”) V01 200000 200000 0 0% 0 0% FALSE FALSE
is.na(age_nm) | age_nm - 18 >= -1e-08 & age_nm - 115 <= 1e-08 V02 200000 170384 29616 14.81% 0 0% FALSE FALSE
is.na(age_cd) | age_cd %vin% c(“0-18”, “18-25”, “25-35”, “35-45”, “45-55”, “55-65”, “65-75”, “75-85”, “85-95”, “95-105”, “105-115”) V03 200000 200000 0 0% 0 0% FALSE FALSE
is.na(exitus_bl) | exitus_bl %vin% c(TRUE, FALSE) V04 200000 200000 0 0% 0 0% FALSE FALSE
is.na(education_level_cd) | education_level_cd %vin% c(“Low”, “Middle”, “High”) V05 200000 200000 0 0% 0 0% FALSE FALSE
is.na(income_category_cd) | income_category_cd %vin% c(“Low”, “Middle”, “High”) V06 200000 200000 0 0% 0 0% FALSE FALSE
is.na(migration_background_cd) | migration_background_cd %vin% c(“NATIVE”, “EU”, “NON-EU”, “PAR”) V07 200000 200000 0 0% 0 0% FALSE FALSE
is.na(household_type_cd) | household_type_cd %vin% c(“ALONE”, “COUPLE”, “COUPLE_CHILD”, “LONE”, “EXTENDED”, “OTHER”) V08 200000 200000 0 0% 0 0% FALSE FALSE
is.na(hospi_due_to_covid_bl) | hospi_due_to_covid_bl %vin% c(TRUE, FALSE) V09 200000 200000 0 0% 0 0% FALSE FALSE
is.na(test_positive_to_covid_nm) | test_positive_to_covid_nm - 0 >= -1e-08 & test_positive_to_covid_nm - 50 <= 1e-08 V10 200000 200000 0 0% 0 0% FALSE FALSE
is.na(test_nm) | test_nm - 0 >= -1e-08 & test_nm - 50 <= 1e-08 V11 200000 200000 0 0% 0 0% FALSE FALSE
is.na(dose_1_brand_cd) | dose_1_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) V12 200000 200000 0 0% 0 0% FALSE FALSE
is.na(dose_2_brand_cd) | dose_2_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) V13 200000 200000 0 0% 0 0% FALSE FALSE
is.na(dose_3_brand_cd) | dose_3_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) V14 200000 200000 0 0% 0 0% FALSE FALSE
is.na(doses_nm) | doses_nm - 0 >= -1e-08 & doses_nm - 10 <= 1e-08 V15 200000 200000 0 0% 0 0% FALSE FALSE
(is.na(dose_1_dt) & is.na(dose_2_dt)) | is.na(dose_2_dt) | !is.na(dose_1_dt) & !is.na(dose_2_dt) & (dose_1_dt < dose_2_dt) V16 200000 110188 89812 44.91% 0 0% FALSE FALSE
(is.na(dose_2_dt) & is.na(dose_3_dt)) | is.na(dose_3_dt) | !is.na(dose_2_dt) & !is.na(dose_3_dt) & (dose_2_dt < dose_3_dt) V17 200000 190007 9993 5% 0 0% FALSE FALSE
is.na(fully_vaccinated_dt) | is.na(exitus_dt) | !is.na(fully_vaccinated_dt) & !is.na(exitus_dt) & fully_vaccinated_dt <= exitus_dt V18 200000 199194 806 0.4% 0 0% FALSE FALSE
(!is.na(dose_1_dt) & !is.na(dose_2_dt) & !is.na(dose_3_dt) & doses_nm - 3 >= -1e-08) | (!is.na(dose_1_dt) & !is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 2) <= 1e-08) | (!is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 1) <= 1e-08) | (is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 0) <= 1e-08) V19 200000 163140 27323 13.66% 9537 4.77% FALSE FALSE
is.na(dose_1_dt) | (!is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V20 200000 190488 9512 4.76% 0 0% FALSE FALSE
is.na(dose_2_dt) | (!is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V21 200000 175638 24362 12.18% 0 0% FALSE FALSE
is.na(dose_3_dt) | (!is.na(dose_3_dt) & !is.na(dose_3_brand_cd) & !is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V22 200000 195618 4382 2.19% 0 0% FALSE FALSE

The vertical bars in the validation plot indicate the percentage of records ‘Passing’, ‘Failing’ and ‘Missing’